In general, Leo suggests using a ggplot related book(s)/references (below). His suggestion was to look through the book (website is a little harder), and find plots you'd like to make. From there copy the code and edit for your use.
For basic graphics help, and a good place to start when learning graphics in R, see: https://r-graphics.org/
## Before working, the following line should be uncommented so that depended libraries are installed
#install.packages(c("ggplot2", "cowplot", "ggpubr", "GGally", "reshape", "plotly", "Polychrome"))
To create a scatter plot use plot().
plot(mtcars$wt, mtcars$mpg) #mtcars is base data that comes with R
The mtcars$wt returns the column named wt from the mtcars data frame, and mtcars$mpg is the mpg column.
With ggplot2, you can get a similar result using the ggplot() function.
library(ggplot2) #This will only work is you've installed ggplot2!
x <- ggplot(mtcars, aes_string(x='wt', y='mpg')) + geom_point()
print(x)
ggplot() creates the plot object. geom_point() adds the layer of points to the plot.
Using ggplot() by passing a data frame and tell it which columns to use. The key difference between python and R here, is that R expects an object and not a string with aes.
ggplot(mtcar, aes(x=wt, y=mpg))if you input a string this will not work, which is why I usedaes_string. This also helps with loop input, which is often a string and not an object.
Leo assigned the ggplot() object to a variable x, and examined that variable. This was very innovatives.
#str(x) ## Commented out to shorten notebook
The cowplot package is a simple add-on to ggplot. It provides various features that help with creating publication-quality figures, such as a set of themes, functions to align plots and arrange them into complex compound figures, and functions that make it easy to annotate plots and or mix plots with images. The package was originally written for internal use in my lab, to provide my students and postdocs with the tools to make high-quality figures for their publications. I have also used the package extensively in my book Fundamentals of Data Visualization. This introductory vignette provides a brief glance at the key features of the package. - Claus O Wilke
For more complete documentation, read:
library(cowplot)
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point()
Generate a simple and clean theme with cowplot: theme_cowplot().
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point() +
theme_cowplot(12)
Another use of cowplot is to annotate and arrange plots. This is currently R specific.
Side note: This might be a good place to generate a python package to mirror cowplot function.
library(repr) # Adjust R kernal plot size
# Change plot size to 8 x 4
options(repr.plot.width=10, repr.plot.height=5)
p1 <- ggplot(mtcars, aes(disp, mpg)) +
geom_point() + theme_cowplot(12)
p2 <- ggplot(mtcars, aes(qsec, mpg)) +
geom_point() + theme_cowplot(12)
plot_grid(p1, p2, labels = c('A', 'B'), label_size = 14)
ggplot()¶GGally::ggcoef, plot the coefficients of a modelGGally::ggduo, display two grouped data in a plot matrixGGally::ggmatrix, managing multiple plots in a matrix-like layoutGGally::ggnetworkmap, plotting elegant maps using ggplotGGally::ggpairs, special form of a ggmatrix that produces a pairwise comparison of multivariate dataoptions(repr.plot.width=16, repr.plot.height=16)
data(tips, package = "reshape") # Only reason we installed `reshape` and not `reshape2`
pm <- GGally::ggpairs(tips)
pm
ggpair as part of the GGally function is useful for getting a visually idea of all the variables compared against each other.
The ‘ggpubr’ package provides some easy-to-use functions for creating and customizing ‘ggplot2’- based publication ready plots.
options(repr.plot.width=7, repr.plot.height=7)
library(ggpubr)
# Create some data format
# :::::::::::::::::::::::::::::::::::::::::::::::::::
set.seed(1234)
wdata = data.frame(
sex = factor(rep(c("F", "M"), each=200)),
weight = c(rnorm(200, 55), rnorm(200, 58)))
head(wdata, 4)
# Density plot with mean lines and marginal rug
# :::::::::::::::::::::::::::::::::::::::::::::::::::
# Change outline and fill colors by groups ("sex")
# Use custom palette
ggdensity(wdata, x = "weight",
add = "mean", rug = TRUE,
color = "sex", fill = "sex",
palette = c("#00AFBB", "#E7B800"))
# Histogram plot with mean lines and marginal rug
# :::::::::::::::::::::::::::::::::::::::::::::::::::
# Change outline and fill colors by groups ("sex")
# Use custom color palette
gghistogram(wdata, x = "weight",
add = "mean", rug = TRUE,
color = "sex", fill = "sex",
palette = c("#00AFBB", "#E7B800"))
# Load data
data("ToothGrowth")
df <- ToothGrowth
head(df, 4)
# Box plots with jittered points
# :::::::::::::::::::::::::::::::::::::::::::::::::::
# Change outline colors by groups: dose
# Use custom color palette
# Add jitter points and change the shape by groups
p <- ggboxplot(df, x = "dose", y = "len",
color = "dose", palette =c("#00AFBB", "#E7B800", "#FC4E07"),
add = "jitter", shape = "dose")
p
# Add p-values comparing groups
# Specify the comparisons you want
my_comparisons <- list( c("0.5", "1"), c("1", "2"), c("0.5", "2") )
p + stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value
stat_compare_means(label.y = 50) # Add global p-value
# Violin plots with box plots inside
# :::::::::::::::::::::::::::::::::::::::::::::::::::
# Change fill color by groups: dose
# add boxplot with white fill color
ggviolin(df, x = "dose", y = "len", fill = "dose",
palette = c("#00AFBB", "#E7B800", "#FC4E07"),
add = "boxplot", add.params = list(fill = "white"))+
stat_compare_means(comparisons = my_comparisons, label = "p.signif")+ # Add significance levels
stat_compare_means(label.y = 50)
# Load data
data("mtcars")
dfm <- mtcars
# Convert the cyl variable to a factor
dfm$cyl <- as.factor(dfm$cyl)
# Add the name colums
dfm$name <- rownames(dfm)
# Inspect the data
head(dfm[, c("name", "wt", "mpg", "cyl")])
ggbarplot(dfm, x = "name", y = "mpg",
fill = "cyl", # change fill color by cyl
color = "white", # Set bar border colors to white
palette = "jco", # jco journal color palett. see ?ggpar
sort.val = "desc", # Sort the value in dscending order
sort.by.groups = FALSE, # Don't sort inside each group
x.text.angle = 90 # Rotate vertically x axis texts
)
ggbarplot(dfm, x = "name", y = "mpg",
fill = "cyl", # change fill color by cyl
color = "white", # Set bar border colors to white
palette = "jco", # jco journal color palett. see ?ggpar
sort.val = "asc", # Sort the value in dscending order
sort.by.groups = TRUE, # Sort inside each group
x.text.angle = 90 # Rotate vertically x axis texts
)
# Calculate the z-score of the mpg data
dfm$mpg_z <- (dfm$mpg -mean(dfm$mpg))/sd(dfm$mpg)
dfm$mpg_grp <- factor(ifelse(dfm$mpg_z < 0, "low", "high"),
levels = c("low", "high"))
# Inspect the data
head(dfm[, c("name", "wt", "mpg", "mpg_z", "mpg_grp", "cyl")])
ggbarplot(dfm, x = "name", y = "mpg_z",
fill = "mpg_grp", # change fill color by mpg_level
color = "white", # Set bar border colors to white
palette = "jco", # jco journal color palett. see ?ggpar
sort.val = "asc", # Sort the value in ascending order
sort.by.groups = FALSE, # Don't sort inside each group
x.text.angle = 90, # Rotate vertically x axis texts
ylab = "MPG z-score",
xlab = FALSE,
legend.title = "MPG Group"
)
ggbarplot(dfm, x = "name", y = "mpg_z",
fill = "mpg_grp", # change fill color by mpg_level
color = "white", # Set bar border colors to white
palette = "jco", # jco journal color palett. see ?ggpar
sort.val = "desc", # Sort the value in descending order
sort.by.groups = FALSE, # Don't sort inside each group
x.text.angle = 90, # Rotate vertically x axis texts
ylab = "MPG z-score",
legend.title = "MPG Group",
rotate = TRUE,
ggtheme = theme_minimal()
)
Polychrome is a tool for creating, viewing, and assessing qualitative palettes with many (20-30 or more) colors. This is of importances due to the update in the color palettes in the new version of R.
Note: If currently writing a paper, try not to upgrade R. Otherwise, you might have to redo the colors.
library(Polychrome) # https://rdrr.io/cran/Polychrome/f/vignettes/polychrome.Rmd
mypal <- kelly.colors(22)
swatch(mypal)
Ren asked question about color blind palettes, Leo pointed out viridisLite. I normally just use viridis.
Matplotlib recently introduced new color maps for their graphs. They are called viridis, magma, inferno, and plasma. viridis was made the new default color map of Matplotlib.
NOTE:
viridisLiteis the 'lite' version of the more completeviridispackage.viridisLitecontains only the core functions of viridis that generate the color vectors for each of the aforementioned color maps. It does not have any of the other features of the full viridis package (e.g. scale functions for ggplot2). This was requested by users of viridis who did not want to have to import the dependencies of viridis but still wanted to be able to use the color maps it provides.
#install.packages("viridisLite")
library(viridisLite)
library(hexbin)
dat <- data.frame(x = rnorm(10000), y = rnorm(10000))
ggplot(dat, aes(x = x, y = y)) +
geom_hex() + coord_fixed() +
scale_fill_gradientn(colours = viridis(256, option = "D"))
# using code from RColorBrewer to demo the palette
n = 200
image(1:n, 1, as.matrix(1:n),
col = viridis(n, option = "D"),
xlab = "viridis n", ylab = "", xaxt = "n", yaxt = "n", bty = "n")
Use the color scales in this package to make plots that are pretty, better represent your data, easier to read by those with colorblindness, and print well in grey scale.
#install.packages("viridis")
library(viridis)
x <- y <- seq(-8*pi, 8*pi, len = 40)
r <- sqrt(outer(x^2, y^2, "+"))
filled.contour(cos(r^2)*exp(-r/(2*pi)),
axes=FALSE,
color.palette=viridis,
asp=1)
ggplot(data.frame(x = rnorm(10000), y = rnorm(10000)), aes(x = x, y = y)) +
geom_hex() + coord_fixed() +
scale_fill_viridis() + theme_bw()
Notes:
library(plotly)
g <- ggplot(faithful, aes(x = eruptions, y = waiting)) +
stat_density_2d(aes(fill = ..level..), geom = "polygon") +
xlim(1, 6) + ylim(40, 100)
ggplotly(g)
This seemed to be very computational expensive, so I have not added it to the notebook.
#install.packages('sessioninfo')
print('Reproducibility information:')
Sys.time()
proc.time()
options(width = 120)
sessioninfo::session_info()